LANGUAGE MODELS FOR TOPIC TRACKING The importance of score normalization

نویسندگان

  • Wessel Kraaij
  • Martijn Spitters
چکیده

Generative unigram language models have proven to be a simple though effective model for information retrieval tasks. In contrast to ad-hoc retrieval, topic tracking requires that matching scores are comparable across topics. Several ranking functions based on generative language models: straight likelihood, likelihood ratio, normalized likelihood ratio, and the related Kullback-Leibler divergence are evaluated in two orientations. Best performance is achieved by the models based on a normalized log-likelihood ratio. Key component of these models is the a-priori probability of a story with respect to a common reference dis-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The importance of score normalization

Generative unigram language models have proven to be a simple though effective model for information retrieval tasks. In contrast to ad-hoc retrieval, topic tracking requires that matching scores are comparable across topics. Several ranking functions based on generative language models: straight likelihood, likelihood ratio, normalized likelihood ratio, and the related Kullback-Leibler diverge...

متن کامل

Tdt-2004: Adaptive Topic Tracking at Maryland

A topic tracking system that combines elements from vector space and language modeling frameworks to compute document scores is described. The model is used for both the traditional TDT topic tracking evaluation design and the new supervised adaptive topic tracking evaluation. Results indicate that supervised adaptation and score normalization should be more closely coupled, and that current te...

متن کامل

Model Selection Based on Tracking Interval Under Unified Hybrid Censored Samples

The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...

متن کامل

Tracking Interval for Type II Hybrid Censoring Scheme

The purpose of this paper is to obtain the tracking interval for difference of expected Kullback-Leibler risks of two models under Type II hybrid censoring scheme. This interval helps us to evaluate proposed models in comparison with each other. We drive a statistic which tracks the difference of expected Kullback–Leibler risks between maximum likelihood estimators of the distribution in two diff...

متن کامل

تشخیص دست‌نوشتۀ‌ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر

The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003